The diversity index for Chicago
Read in only the fields from data/il2020.pl_COMBINED_TRACT.csv that are needed for calculating the diversity index, and use the function to calculate the field.
The diversity index indicates how likely you are to pick people of two different races when you pick two people from a population. It’s calculated by calculating the propbability that you don’t pick two people of the same race.
To calculate race in this context the variables that break out population by Hispanic / Non-Hispanic are used, and for simplicity of calculation the (relatively small) multiracial population is all treated as a single race. This simplifying assumption has little effect on the result, and makes the index much easier to compute. Plus, since it’s consistently computed, it’s reasonable to use as a comparison between regions.
Relevant fields for Diversity Index:
P0020001 - Total popP0020002 - Hispanic or LatinoP0020005 - Not Hispanic or Latino: White aloneP0020006 - Not Hispanic or Latino: Black or African American aloneP0020007 - Not Hispanic or Latino: American Indian and Alaska Native aloneP0020008 - Not Hispanic or Latino: Asian aloneP0020009 - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander aloneP0020010 - Not Hispanic or Latino: Some Other Race aloneP0020011 - Not Hispanic or Latino: Population of two or more racesOther header fields: GEOID, GEOCODE, STATE, COUNTY, SUMLEV, TRACT, BLKGRP, BLOCK, CSA, BASENAME, POP100, HU100, INTPTLAT, INTPTLON
Formula for Diversity Index: 1 - ((H/TOT)^2 + (W/TOT)^2 + (B/TOT)^2 + (AIAN/TOT)^2 + (ASIAN/TOT)^2 + (NHPI/TOT)^2 + (SOR/TOT)^2 + (MULTI/TOT)^2)
The logic for these computations is encapsulated in the R functions in R/functions.
pl_2020_tract <- "data/il2020.pl_COMBINED_TRACT.csv" %>%
fread_census %>%
di_race_var_table
pl_2020_tract$DI <- diversity_index(pl_2020_tract)
pl_2020_tract
## STATE COUNTY TRACT GEO_ID LSAD_NAME TOT H W B AIAN ASIAN
## 1: 17 001 000100 17001000100 NA 4644 93 4231 117 9 42
## 2: 17 001 000201 17001000201 NA 2067 38 1827 79 4 18
## 3: 17 001 000202 17001000202 NA 2870 98 2393 183 8 23
## 4: 17 001 000400 17001000400 NA 3793 67 2953 528 11 3
## 5: 17 001 000500 17001000500 NA 1719 31 1366 198 3 4
## ---
## 3261: 17 203 030501 17203030501 NA 7842 200 7194 55 3 94
## 3262: 17 203 030502 17203030502 NA 2387 57 2176 9 5 12
## 3263: 17 203 030601 17203030601 NA 6324 132 5835 93 2 33
## 3264: 17 203 030602 17203030602 NA 3597 45 3426 9 8 22
## 3265: 17 203 030700 17203030700 NA 4532 87 4229 54 2 12
## NHPI SOR MULTI DI
## 1: 7 9 136 0.16797006
## 2: 0 13 88 0.21500863
## 3: 0 4 161 0.29632847
## 4: 3 23 205 0.37121916
## 5: 2 14 101 0.35141378
## ---
## 3261: 2 19 275 0.15635679
## 3262: 2 0 126 0.16557604
## 3263: 1 15 213 0.14685054
## 3264: 0 14 73 0.09218707
## 3265: 0 3 145 0.12770402
A histogram of the index: